library(forecast)
source("http://gozips.uakron.edu/~nmimoto/477/TS_R-90.txt")
#--- Linear trend only
X <- 2+5*(1:200)
layout(matrix(1:2, 1,2))
plot(X, type="o")
plot(diff(X, 12), type="o") #--- Quadratic trend only
X <- 2+3*(1:200)^2
layout(matrix(1:3,1,3))
plot(X, type="o")
plot(diff(X, 12), type="o")
plot(diff(diff(X, 12)), type="o")Suppose \(s=12\), and your observation \(Y_t\) has linear trend, \[ Y_t = a+b t + S_t + e_t \hspace10mm \mbox{ where } e_t \sim N(0,1) \]
Taking \(\bigtriangledown_{12}\) yields, \[ \begin{align} Y_{t+12}-Y_{t} &= \hspace3mm a+b (t+12) + S_{t+12} + e_{t+12} \\\\ &- ( a+b (t) \hspace15mm + S_t \hspace5mm + e_t ) \\\\ &= \hspace3mm \hspace10mm b(12) + e_{t+12} - e_{t} \end{align} \]
What kind of process is this? \[ K_t \hspace3mm = \hspace3mm e_{t} - e_{t-12} \]
Linar regression with Seasonal average will be a better approach.
library(forecast)
source("http://gozips.uakron.edu/~nmimoto/477/TS_R-90.txt")
#--- Linear trend + WN
set.seed(3462)
X <- 2+.5*(1:200) + rnorm(200,0,5)
layout(matrix(1:2,1,2))
plot(X, type="o")
plot(diff(X, 12), type="o")## Series: diff(X, 12)
## ARIMA(0,0,0) with non-zero mean
##
## Coefficients:
## mean
## 6.1451
## s.e. 0.4902
##
## sigma^2 estimated as 45.41: log likelihood=-624.95
## AIC=1253.9 AICc=1253.96 BIC=1260.37
Fit1 <- Arima(diff(X, 12), order=c(0,0,0), seasonal=c(0,0,1)) #- try forceing sMA(1)
Fit1 #- doesn't work ## Series: diff(X, 12)
## ARIMA(0,0,0) with non-zero mean
##
## Coefficients:
## mean
## 6.1451
## s.e. 0.4902
##
## sigma^2 estimated as 45.41: log likelihood=-624.95
## AIC=1253.9 AICc=1253.96 BIC=1260.37
X2 <- ts(X, start=c(1,1), freq=12) # <- set freq=12
Fit2 <- auto.arima(diff(X2, 12), stepwise=FALSE) #- Now they do pick up the seasonal term
Fit2## Series: diff(X2, 12)
## ARIMA(0,0,0)(2,0,0)[12] with non-zero mean
##
## Coefficients:
## sar1 sar2 mean
## -0.6780 -0.2089 6.1699
## s.e. 0.0719 0.0747 0.2172
##
## sigma^2 estimated as 29.7: log likelihood=-586.82
## AIC=1181.63 AICc=1181.85 BIC=1194.58
## B-L test H0: the sereis is uncorrelated
## M-L test H0: the square of the sereis is uncorrelated
## J-B test H0: the sereis came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.363 0.152 0.115 0.107 0.173 0.608 5.42
## Series: diff(X2, 12)
## ARIMA(0,0,0)(0,0,1)[12] with non-zero mean
##
## Coefficients:
## sma1 mean
## -1.0000 6.1078
## s.e. 0.0933 0.0683
##
## sigma^2 estimated as 21.79: log likelihood=-572.31
## AIC=1150.61 AICc=1150.74 BIC=1160.32
## B-L test H0: the sereis is uncorrelated
## M-L test H0: the square of the sereis is uncorrelated
## J-B test H0: the sereis came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.929 0.854 0.875 0.143 0.214 0.102 4.656
Suppose \(s=12\), and your observation \(Y_t\) has random walk without drift as trend, together with the seasonality. \[ Y_t = W_t + S_t + X_t. \] Where \[ W_t \hspace3mm = \hspace3mm \sum_{i=1}^t e_i \hspace10mm e_i \sim_{iid} N(0,\sigma^2). \]
If you take seasonal difference, \[ \begin{align} \bigtriangledown_{12} Y_{t} &= \hspace3mm W_t \hspace5mm + S_t \hspace5mm + X_t \\ \\ &- \hspace3mm W_{t-12} - S_{t-12} - X_{t-12} \\ \\ &= \hspace3mm (W_t - W_{t-12}) + (X_t - X_{t-12}) \\ \end{align} \]
The first part, \[ W_{t} - W_{t-12} \hspace3mm = \hspace3mm \sum_{i=0}^{t-1} e_{t-i} - \sum_{i=12}^{t-1} e_{t-i} \\ \hspace3mm = \hspace3mm \sum_{i=0}^{11} e_{t-i} \hspace10mm \sim N(0,12\sigma^2) \]
This is MA(11) with unit root. (non-invertible)
library(forecast)
source("http://gozips.uakron.edu/~nmimoto/477/TS_R-90.txt")
#--- RW without drift ---
set.seed(5351)
RW1 <- cumsum(rnorm(200, 0, 1))
layout(matrix(1:2, 1,2))
plot(RW1, type="l")
plot(diff(RW1, 12), type="o")## Series: diff(RW2, 12)
## ARIMA(1,0,0)(1,0,0)[12] with non-zero mean
##
## Coefficients:
## ar1 sar1 mean
## 0.9458 -0.5853 -0.5674
## s.e. 0.0224 0.0590 0.9006
##
## sigma^2 estimated as 1.301: log likelihood=-293.38
## AIC=594.75 AICc=594.97 BIC=607.7
## Series: diff(RW2, 12)
## ARIMA(0,0,12) with non-zero mean
##
## Coefficients:
## ma1 ma2 ma3 ma4 ma5 ma6 ma7 ma8
## 0.9804 0.9841 1.0436 1.0251 1.0400 1.0394 1.0251 1.0419
## s.e. 0.0877 0.0927 0.1070 0.1184 0.1244 0.1202 0.1198 0.1100
## ma9 ma10 ma11 ma12 mean
## 0.9816 0.9795 0.9610 -0.0382 -0.5658
## s.e. 0.1061 0.0964 0.1055 0.0826 0.7925
##
## sigma^2 estimated as 0.9217: log likelihood=-269.06
## AIC=566.11 AICc=568.54 BIC=611.42
#--- RW with drift ---
set.seed(5351)
RW.d <- ts(cumsum(rnorm(200, .5, 1)), start=c(1,1), freq=12)
plot(RW.d, type="l")
plot(diff(RW.d, 12), type="o")## Series: diff(RW.d, 12)
## ARIMA(1,0,0)(1,0,0)[12] with non-zero mean
##
## Coefficients:
## ar1 sar1 mean
## 0.9458 -0.5853 5.4326
## s.e. 0.0224 0.0590 0.9006
##
## sigma^2 estimated as 1.301: log likelihood=-293.38
## AIC=594.75 AICc=594.97 BIC=607.7
## Series: diff(RW.d, 12)
## ARIMA(0,0,11) with non-zero mean
##
## Coefficients:
## ma1 ma2 ma3 ma4 ma5 ma6 ma7 ma8
## 1.0112 1.0141 1.0733 1.0555 1.0670 1.0670 1.0555 1.0733
## s.e. 0.0645 0.0685 0.0922 0.1024 0.1181 0.1094 0.1082 0.0910
## ma9 ma10 ma11 mean
## 1.0141 1.0112 1.000 5.4364
## s.e. 0.0864 0.0689 0.072 0.8175
##
## sigma^2 estimated as 0.9172: log likelihood=-269.16
## AIC=564.33 AICc=566.42 BIC=606.4
Monthly CO2 levels at NW territories CANADA from Jan 1959 through Dec 1997.
acf1 <- acf; library(TSA); acf <- acf1
data(co2) #- we are using co2 data in TSA package (there's another one in R)
library(forecast)
source("http://gozips.uakron.edu/~nmimoto/477/TS_R-90.txt")
plot(co2, type="o", main='CO2 Data') #- take Del_12 -
plot(diff(co2, 12), main='(Del_12) CO2', xlab='Time')
Arima(co2, order=c(0,0,0), seasonal=c(0,1,1)) # sMA(1) not too close to 1## Series: co2
## ARIMA(0,0,0)(0,1,1)[12]
##
## Coefficients:
## sma1
## 0.375
## s.e. 0.072
##
## sigma^2 estimated as 3.795: log likelihood=-250.49
## AIC=504.98 AICc=505.08 BIC=510.56
## Series: co2
## ARIMA(0,0,11)(0,1,0)[12]
##
## Coefficients:
## ma1 ma2 ma3 ma4 ma5 ma6 ma7 ma8
## 0.7503 0.5927 0.6235 0.7815 0.8266 0.8043 0.8355 0.8013
## s.e. 0.1028 0.1221 0.1157 0.1170 0.1113 0.1368 0.1101 0.1189
## ma9 ma10 ma11
## 0.6169 0.4684 0.7954
## s.e. 0.1208 0.1151 0.0955
##
## sigma^2 estimated as 0.6845: log likelihood=-150.54
## AIC=325.08 AICc=328 BIC=358.53
## Series: co2
## ARIMA(0,0,11)(0,1,0)[12] with drift
##
## Coefficients:
## ma1 ma2 ma3 ma4 ma5 ma6 ma7 ma8
## 0.3972 0.6207 0.2726 0.7195 0.4868 0.7640 0.4740 0.8061
## s.e. 0.1611 0.1247 0.1690 0.0886 0.1660 0.1241 0.1696 0.1164
## ma9 ma10 ma11 drift
## 0.1842 0.5929 0.3300 0.1430
## s.e. 0.1883 0.1446 0.2036 0.0371
##
## sigma^2 estimated as 0.6598: log likelihood=-145.02
## AIC=316.04 AICc=319.48 BIC=352.28
## Series: co2
## ARIMA(0,0,0)(0,1,1)[12] with drift
##
## Coefficients:
## sma1 drift
## -0.9998 0.1527
## s.e. 0.1653 0.0018
##
## sigma^2 estimated as 0.6634: log likelihood=-157.82
## AIC=321.64 AICc=321.85 BIC=330
Take \(\bigtriangledown_12\) and it has drift with sMA(1) close to 1.
This is a good sign that original co2 has \[ Y_t = W_t + S_t + e_t \]
#--- Take Monthly Averages
X0 <- co2
Mav1 <- aggregate(c(X0), list(month=cycle(X0)), mean)$x #- 1yr long Mtly Av
M.av1 <- ts(Mav1[cycle(X0)], start=start(X0), freq=frequency(X0)) #- Mtly Av
Ds.X0 <- X0-M.av1 #- Subtract from original
plot(X0, type="o") # original
lines(M.av1, col="blue") # seasonal averages##
## Call:
## lm(formula = Ds.co2 ~ time(Ds.co2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.22331 -0.69056 -0.01021 0.64085 2.36388
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.633e+03 5.112e+01 -71.07 <2e-16 ***
## time(Ds.co2) 1.817e+00 2.557e-02 71.07 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9327 on 130 degrees of freedom
## Multiple R-squared: 0.9749, Adjusted R-squared: 0.9747
## F-statistic: 5052 on 1 and 130 DF, p-value: < 2.2e-16
## Series: Reg1$resid
## ARIMA(1,0,0) with zero mean
##
## Coefficients:
## ar1
## 0.5273
## s.e. 0.0739
##
## sigma^2 estimated as 0.6228: log likelihood=-155.71
## AIC=315.42 AICc=315.51 BIC=321.19
## B-L test H0: the sereis is uncorrelated
## M-L test H0: the square of the sereis is uncorrelated
## J-B test H0: the sereis came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.386 0.13 0.074 0.691 0.817 0.416 0.789
Resid.co2 <- ts(Reg1$resid, start=c(1994,1), freq=12)
Fit12 <- auto.arima( Resid.co2, stepwise=FALSE )
Fit12## Series: Resid.co2
## ARIMA(2,0,0)(2,0,0)[12] with zero mean
##
## Coefficients:
## ar1 ar2 sar1 sar2
## 0.3858 0.2620 0.2460 0.2002
## s.e. 0.0852 0.0891 0.0901 0.0924
##
## sigma^2 estimated as 0.5589: log likelihood=-148.2
## AIC=306.4 AICc=306.88 BIC=320.82
## B-L test H0: the sereis is uncorrelated
## M-L test H0: the square of the sereis is uncorrelated
## J-B test H0: the sereis came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.758 0.425 0.487 0.641 0.602 0.904 0.739
## Series: Resid.co2
## ARIMA(2,0,0)(1,0,0)[12] with zero mean
##
## Coefficients:
## ar1 ar2 sar1
## 0.3970 0.2333 0.2862
## s.e. 0.0867 0.0916 0.0952
##
## sigma^2 estimated as 0.5791: log likelihood=-150.46
## AIC=308.92 AICc=309.24 BIC=320.45
## B-L test H0: the sereis is uncorrelated
## M-L test H0: the square of the sereis is uncorrelated
## J-B test H0: the sereis came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.547 0.277 0.238 0.373 0.408 0.847 0.755
\(s=12\) because freq=12 was set in ts object.
Took \(\bigtriangledown_{12}\) then checked for sMA(1) paramter and MA(11) parameters.
The drift term (mean after the seasonal difference) was significant. With the drift, sMA(1) parameter was too close to 1, indicating that original series has linear trend with WN \(e_t\).
With observation \(Y_t\), \[ Y_t \hspace3mm = \hspace3mm L_t + S_t + X_t \] where \(X_t\) was \(ARMA(2,0) \times (1,0)\) \[ X_t = \phi_1 X_{t-1} + \phi_2 X_{t-2} + \Phi_1 X_{t-12} + e_t \]
Suppose we have ARIMA\((1,0,1) \times (1,1,0)_{12}\) model, \[ (1-\phi_1 B) (1-\Phi_1 B^{12}) \bigtriangledown_{12} Y_t = (1-\theta_1 B) e_t \]
This is same as modeling \(\bigtriangledown_{12} Y_t\) with ARMA(13,1), \[ X_t \hspace3mm = \hspace3mm \bigtriangledown_{12} Y_t \\ \Big(1-\phi_1 B - \Phi_1 B^{12} + \phi_1 \Phi_1 B^{13} \Big) X_t \hspace3mm = \hspace3mm (1-\theta_1 B) e_t \]
We know how to get a prediction \(\hat X(h)\) for ARMA(\(p,q\)), Our predictor for \(Y_t\) will be \[ \begin{align} Y_{n+1} &= \hspace3mm Y_{n+1-12} + \hat X(1) \\ Y_{n+2} &= \hspace3mm Y_{n+1-12} + \hat X(2) \\ &\vdots \end{align} \]
How do you check if you need to take a seasonal difference, instead of regular difference?
OCSB test (\(H_0\): Seasonal Unit Root Exists)
CH test (\(H_0\): Deterministic Seasonality Exists)
Fore more detail: http://robjhyndman.com/hyndsight/forecast3/
Osborn-Chui-Smith-Birchenhall (1988) test for
Default choice in for choosing value of \(D\).
\[
\left\{
\begin{array}{ll}
H_0: \mbox{ Seasonal Unit Root Exists}\\
H_A: H_0 \mbox{ is false} \\
\end{array} \right.
\]
Small p-value means “Take Seasonal Difference”.
data(co2) #- No need to load package for this data
library(forecast)
source("http://gozips.uakron.edu/~nmimoto/477/TS_R-90.txt")
plot(co2, type="o")
nsdiffs(X, m=12, test="ocsb")
auto.arima(co2)
Fit1 <- Arima(co2, order=c(0,1,1), seasonal=c(0,1,1))
Fit1
plot(forecast(Fit1))
forecast(Fit1)
plot(diff(co2, 12))
Stationarity.tests(diff(co2, 12))
auto.arima(co2, d=0)
Fit2 <- Arima(co2, order=c(1,0,0), seasonal=c(2,1,1))